Add Zamba2 #34517

pglorio · 2024-10-30T17:57:31Z

What does this PR do?

Please include support for Zamba2 architecture created by Zyphra Technologies.

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

@ArthurZucker

…into zamba2

Rebase zamba2

rebase

Rebase

pglorio · 2024-11-11T06:50:35Z

Hey @Arthur,

Thank you again for your help in getting Zamba2 into transformers! The PR is now finally ready to be reviewed. I added the documentation and all unit tests pass, including slow tests.

A few remarks, mostly related to modular transformers:

To generate modeling and configuration I used utils/modular_model_converter.py from a previous commit because the most recent version of this script that followed from a large refactoring produces an error that I was not able to fix:

Converting src/transformers/models/zamba2/modular_zamba2.py to a single model single file format
Traceback (most recent call last):
  File "/workspace/transformers_zamba/utils/modular_model_converter.py", line 1510, in <module>
    converted_files = convert_modular_file(file_name, args.old_model_name, args.new_model_name)
  File "/workspace/transformers_zamba/utils/modular_model_converter.py", line 1447, in convert_modular_file
    for file, module in create_modules(cst_transformers).items():
  File "/workspace/transformers_zamba/utils/modular_model_converter.py", line 1387, in create_modules
    nodes_to_add, file_type, new_imports = get_class_node_and_dependencies(modular_mapper, class_name, node, files)
  File "/workspace/transformers_zamba/utils/modular_model_converter.py", line 1337, in get_class_node_and_dependencies
    new_node_dependencies, new_imports = check_dependencies_and_create_import_node(
  File "/workspace/transformers_zamba/utils/modular_model_converter.py", line 1283, in check_dependencies_and_create_import_node
    class_dependencies = {dep for dep in new_dependencies if m.matches(mapper.global_nodes[dep], m.ClassDef())}
  File "/workspace/transformers_zamba/utils/modular_model_converter.py", line 1283, in <setcomp>
    class_dependencies = {dep for dep in new_dependencies if m.matches(mapper.global_nodes[dep], m.ClassDef())}
KeyError: 'Zamba2Config'

I carefully compared Zamba2Config with classes of other models that also use modular (such as Gemma2Config) and they appear to have consistent format. Relatedly, the utils/modular_model_converter.py in the current PR (path) is the version from the previous commit mentioned above.

After running utils/modular_model_converter.py, the modeling and configuration files generated contain unintended code that I had to update. All these modifications are in this commit. In particular, the produced modeling file contains Zamba2DynamicCache, which is the correct cache of Zamba2 as well as HybridMambaAttentionDynamicCache, which is the cache of Zamba and is not relevant to Zamba2, so I deleted HybridMambaAttentionDynamicCache and related references.
I ran make fixup and all zamba-related tests pass, with the exception of python utils/check_modular_conversion.py. This test doesn't pass due to the modifications mentioned in the previous point.
I slightly edited the Zamba2MambaMixer compared to the original Mamba2Mixer of mamba2, the main difference is that I added these lines, which was necessary to appropriately process the mamba2 cache (note this step already existed in the torch forward in these lines).

Looking forward to your feedback. Thanks so much!

src/transformers/models/zamba2/modular_zamba2.py

rebase on upstream

Cyrilvallez · 2024-12-03T18:12:05Z

Hey @pglorio, so sorry for the delay!! I think the first error you encounter comes from a wrong ruff version. Our CI uses ruff==0.5.1, so could you make sure to install the same and run it again?

EDIT: but actually, the only failing tests I see are Zamba2 tests

Cyrilvallez

Hey! Here is a first round of reviews, you can check out the comments! For the lora layers, maybe rename them residual instead of lora to make it explicit that it is part of the original model?

You should also make sure that all the tests you added pass, this is not the case actually!

src/transformers/models/zamba2/__init__.py

src/transformers/models/zamba2/modular_zamba2.py

pglorio · 2024-12-05T22:27:17Z

Thank you @Cyrilvallez for the super helpful feedback! Please see above the replies to each comment.

It seems zamba2-related tests all pass locally for us, but not in the circle/ci setting of the PR. The error is always

   1 failed because `IndexError` -> Dimension out of range (expected to be in range of [-2, 1], but got 2)
   9 failed because `RuntimeError` -> Sizes of tensors must match except in dimension 2. Expected size 8 but got size 1 for tensor number 1 in the list.

for all tests that fail. We investigated the discrepancy by examinating more closely the test tests/models/zamba2/test_modeling_zamba2.py::Zamba2ModelTest::test_beam_sample_generate. We tried using pytest-flakefinder as suggested here and using this command python3 -m pytest -m 'generate' -n 8 -vvv --dist=loadfile --flake-finder --flake-runs=1000 --max-worker-restart=0 tests/models/zamba2/test_modeling_zamba2.py::Zamba2ModelTest::test_beam_sample_generate but the test still passes. This is the closest reproduction of the circle/ci test that we could reproduce to the best of our knowledge. We'd appreciate if we can get a hint of what we can do to fix this discrepancy!

Also, we are happy to rename lora layers as you suggested here and agree it would clarify the role of this module. Would pretrained_lora be also good as an option (residual might for some people refer to the identity stream)?

Thank you so much again!

pglorio · 2024-12-07T04:16:30Z

We renamed lora -> adapter here and, following your suggestion in this comment we simplified the adapter using nn.Sequential in this commit.

We re-ran the model tests and they all pass locally, but it looks like they still fail in this PR environment.

Cyrilvallez

Nice! I pointed out some minor issues, but this is almost ready! 🤗 However, you might want to consider switching the attention implementation to our newest standard here #35235 (see e.g. the llama model as an example). This is not strictly needed, but should be quite straightforward from the examples, and will help both lisibility and maintenance in the long run. And as this change was introduced to make transformers compatible with high-performance inference frameworks such as vLLM and TGI, it would potentially make Zamba2 compatible as well (not sure how we will support stateful models, and if the shapes in the Mixers are compatible, but cannot hurt!) in the future!

src/transformers/models/zamba2/modular_zamba2.py

Cyrilvallez · 2024-12-18T16:49:02Z

I cannot reproduce your issue with tests though, they are failing locally for me. Running CUDA_VISIBLE_DEVICES=2 pytest tests/models/zamba2/test_modeling_zamba2.py gives:

pglorio · 2024-12-20T01:09:17Z

Thank you so much for this feedback @Cyrilvallez. I realized that the issue with unit tests was inside the torch_forward method of the mamba2 mixer (when i ran locally, the unit tests used cuda_kernels method instead). I fixed that method here: 1 2 3.

By the way, we originally took the the torch_forward from the mamba2 model, so the same issues hold there. In particular, running this:

config = Mamba2Config(num_heads=8,
        n_groups=8,
        state_size=2,
        head_dim=8,
        conv_kernel=4,
        chunk_size=8,
        vocab_size=99,
        hidden_size=32,
        num_hidden_layers=4,
        hidden_act="silu",
        hidden_dropout_prob=0.1,
        max_position_embeddings=512,
                      )
model = Mamba2ForCausalLM(config)

inputs = {'input_ids': torch.tensor([[86,  6, 51,  3, 12, 15, 33, 18,  4, 92],
         [69, 66, 49, 45, 48, 44, 61, 56, 68, 85]]),
 'attention_mask': torch.tensor([[0, 0, 1, 1, 1, 0, 0, 0, 1, 0],
         [0, 1, 1, 0, 1, 1, 1, 0, 1, 1]])}

outputs_cpu = model.generate(**inputs, max_new_tokens=5, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
model = model.to('cuda')
inputs = {key: tensor.to(device=model.device) for key, tensor in inputs.items()}
outputs_cuda = model.generate(**inputs, max_new_tokens=5, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print(torch.all(outputs_cpu == outputs_cuda.cpu()).item())

returns False.

pglorio added 6 commits October 24, 2024 05:33

First commit

acd25b7

Finish model implementation

70639b8

First commit

d111b98

Finish model implementation

8f36dba

Merge branch 'zamba2' of https://github.com/Zyphra/transformers_zamba …

f0c547c

…into zamba2

Register zamba2

700fbf0

pglorio marked this pull request as draft October 30, 2024 17:57

pglorio and others added 17 commits November 4, 2024 23:57

generated modeling and configuration

70a6021

Merge pull request #2 from Zyphra/main

88c4b26

Rebase zamba2

generated modeling and configuration

685906a

added hybrid cache

4da8d5f

fix attention_mask in mamba

6b5a9be

dropped unused loras

248350d

fix flash2

d1d2c66

Merge pull request #3 from Zyphra/main

eb6063e

rebase

config docstrings

5f5d01e

fix config and fwd pass

c1b7647

make fixup fixes

979b99b

text_modeling_zamba2

9d9b2eb

Merge pull request #4 from Zyphra/main

3a457f5

Rebase

small fixes

549d4cb

make fixup fixes

987bba9

Merge pull request #5 from Zyphra/main

ffc2a58

Rebase

Fix modular model converter

9adf85e

ArthurZucker reviewed Nov 14, 2024

View reviewed changes

src/transformers/models/zamba2/modular_zamba2.py Show resolved Hide resolved

added inheritances in modular, renamed zamba cache

904da4e

pglorio force-pushed the zamba2 branch from 6d20bf9 to 904da4e Compare November 19, 2024 06:28

pglorio and others added 2 commits November 19, 2024 01:06

Merge pull request #6 from Zyphra/main

4725983

rebase on upstream

modular rebase

0be27d7

Cyrilvallez self-requested a review November 28, 2024 16:08

Cyrilvallez reviewed Dec 3, 2024

View reviewed changes

pglorio added 5 commits December 5, 2024 00:46

Add inheritance of rope from GemmaRotaryEmbedding

fa5f79e

moved rope to model init

8079ae0

drop del self.self_attn and del self.feed_forward

d6206eb

Rebase onto upstream

f832699

fix tests

cf613b7

pglorio added 4 commits December 7, 2024 03:16

renamed lora -> adapter

337faed

rewrote adapter implementation

f1b31a1

rebase

8925c15

fixed tests

11fdd47

Cyrilvallez reviewed Dec 18, 2024

View reviewed changes

src/transformers/models/zamba2/modular_zamba2.py Outdated Show resolved Hide resolved

src/transformers/models/zamba2/modular_zamba2.py Outdated Show resolved Hide resolved

src/transformers/models/zamba2/modular_zamba2.py Outdated Show resolved Hide resolved

huggingface deleted a comment from github-actions bot Dec 18, 2024

pglorio added 8 commits December 18, 2024 23:32

Merge branch 'main' into zamba2

02dd042

Fix torch_forward in mamba2 layer

5d0a5d4

Fix torch_forward in mamba2 layer

ef055c9

Fix torch_forward in mamba2 layer

b993a78

Dropped adapter in-place sum

bf93251

removed rope from attention init

99708af

updated rope

d9b4a50

created get_layers method

095d853

pglorio added 4 commits December 20, 2024 01:17

rebase

10ebad5

make fixup fix

99e343e

make fixup fixes

4e40975

make fixup fixes

61bb32f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Zamba2 #34517

Add Zamba2 #34517

pglorio commented Oct 30, 2024

pglorio commented Nov 11, 2024

Cyrilvallez commented Dec 3, 2024 •

edited

Loading

Cyrilvallez left a comment

pglorio commented Dec 5, 2024 •

edited

Loading

pglorio commented Dec 7, 2024

Cyrilvallez left a comment

Cyrilvallez commented Dec 18, 2024

pglorio commented Dec 20, 2024

Add Zamba2 #34517

Are you sure you want to change the base?

Add Zamba2 #34517

Conversation

pglorio commented Oct 30, 2024

What does this PR do?

Who can review?

pglorio commented Nov 11, 2024

Cyrilvallez commented Dec 3, 2024 • edited Loading

Cyrilvallez left a comment

Choose a reason for hiding this comment

pglorio commented Dec 5, 2024 • edited Loading

pglorio commented Dec 7, 2024

Cyrilvallez left a comment

Choose a reason for hiding this comment

Cyrilvallez commented Dec 18, 2024

pglorio commented Dec 20, 2024

Cyrilvallez commented Dec 3, 2024 •

edited

Loading

pglorio commented Dec 5, 2024 •

edited

Loading